## birth exit hospstay lowph
## Min. :81.51 Min. :68.53 Min. :-6574.00 Min. :6.530
## 1st Qu.:83.52 1st Qu.:83.58 1st Qu.: 16.00 1st Qu.:7.130
## Median :84.90 Median :84.96 Median : 37.00 Median :7.210
## Mean :84.75 Mean :84.84 Mean : 40.36 Mean :7.202
## 3rd Qu.:86.07 3rd Qu.:86.17 3rd Qu.: 62.00 3rd Qu.:7.310
## Max. :87.48 Max. :96.87 Max. : 3668.00 Max. :7.550
## NA's :21 NA's :31 NA's :31 NA's :62
## pltct race bwt gest
## Min. : 16.0 black :369 Min. : 400 Min. :22.00
## 1st Qu.:143.0 native American: 16 1st Qu.: 900 1st Qu.:27.00
## Median :202.0 oriental : 4 Median :1120 Median :29.00
## Mean :201.6 white :257 Mean :1094 Mean :28.87
## 3rd Qu.:252.0 NA's : 25 3rd Qu.:1310 3rd Qu.:31.00
## Max. :571.0 Max. :1580 Max. :40.00
## NA's :70 NA's :2 NA's :4
## inout twn lol magsulf
## born at Duke:547 Min. :0.0000 Min. : 0.000 Min. :0.0000
## transported :121 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:0.0000
## NA's : 3 Median :0.0000 Median : 3.500 Median :0.0000
## Mean :0.2074 Mean : 8.438 Mean :0.1344
## 3rd Qu.:0.0000 3rd Qu.: 9.000 3rd Qu.:0.0000
## Max. :1.0000 Max. :192.000 Max. :1.0000
## NA's :20 NA's :381 NA's :247
## meth toc delivery apg1
## Min. :0.0000 Min. :0.0000 abdominal:314 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 vaginal :335 1st Qu.:2.000
## Median :0.0000 Median :0.0000 NA's : 22 Median :5.000
## Mean :0.4372 Mean :0.2248 Mean :4.903
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:7.000
## Max. :1.0000 Max. :1.0000 Max. :9.000
## NA's :106 NA's :106 NA's :34
## vent pneumo pda cld
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.5803 Mean :0.1969 Mean :0.2087 Mean :0.2694
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :30 NA's :26 NA's :29 NA's :66
## pvh ivh ipe year sex
## absent :360 absent :442 absent :472 Min. :81.51 female:320
## definite:125 definite: 75 definite: 38 1st Qu.:83.52 male :330
## possible: 41 possible: 10 possible: 17 Median :84.91 NA's : 21
## NA's :145 NA's :144 NA's :144 Mean :84.76
## 3rd Qu.:86.07
## Max. :87.48
## NA's :21
## dead id
## Min. :0.0000 Min. : 1.0
## 1st Qu.:0.0000 1st Qu.:168.5
## Median :0.0000 Median :336.0
## Mean :0.2146 Mean :336.0
## 3rd Qu.:0.0000 3rd Qu.:503.5
## Max. :1.0000 Max. :671.0
##
| Name | Piped data |
| Number of rows | 671 |
| Number of columns | 27 |
| _______________________ | |
| Column type frequency: | |
| factor | 7 |
| numeric | 20 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| race | 25 | 0.96 | FALSE | 4 | bla: 369, whi: 257, nat: 16, ori: 4 |
| inout | 3 | 1.00 | FALSE | 2 | bor: 547, tra: 121 |
| delivery | 22 | 0.97 | FALSE | 2 | vag: 335, abd: 314 |
| pvh | 145 | 0.78 | FALSE | 3 | abs: 360, def: 125, pos: 41 |
| ivh | 144 | 0.79 | FALSE | 3 | abs: 442, def: 75, pos: 10 |
| ipe | 144 | 0.79 | FALSE | 3 | abs: 472, def: 38, pos: 17 |
| sex | 21 | 0.97 | FALSE | 2 | mal: 330, fem: 320 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| birth | 21 | 0.97 | 84.75 | 1.60 | 81.51 | 83.52 | 84.90 | 86.07 | 87.48 | ▅▆▇▇▆ |
| exit | 31 | 0.95 | 84.84 | 1.79 | 68.53 | 83.58 | 84.96 | 86.17 | 96.87 | ▁▁▇▅▁ |
| hospstay | 31 | 0.95 | 40.36 | 304.84 | -6574.00 | 16.00 | 37.00 | 62.00 | 3668.00 | ▁▁▁▇▁ |
| lowph | 62 | 0.91 | 7.20 | 0.14 | 6.53 | 7.13 | 7.21 | 7.31 | 7.55 | ▁▁▃▇▂ |
| pltct | 70 | 0.90 | 201.62 | 80.55 | 16.00 | 143.00 | 202.00 | 252.00 | 571.00 | ▃▇▅▁▁ |
| bwt | 2 | 1.00 | 1093.89 | 265.22 | 400.00 | 900.00 | 1120.00 | 1310.00 | 1580.00 | ▂▅▆▇▅ |
| gest | 4 | 0.99 | 28.87 | 2.55 | 22.00 | 27.00 | 29.00 | 31.00 | 40.00 | ▂▇▆▁▁ |
| twn | 20 | 0.97 | 0.21 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| lol | 381 | 0.43 | 8.44 | 19.26 | 0.00 | 0.00 | 3.50 | 9.00 | 192.00 | ▇▁▁▁▁ |
| magsulf | 247 | 0.63 | 0.13 | 0.34 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| meth | 106 | 0.84 | 0.44 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| toc | 106 | 0.84 | 0.22 | 0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| apg1 | 34 | 0.95 | 4.90 | 2.63 | 0.00 | 2.00 | 5.00 | 7.00 | 9.00 | ▅▆▆▇▇ |
| vent | 30 | 0.96 | 0.58 | 0.49 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▆▁▁▁▇ |
| pneumo | 26 | 0.96 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| pda | 29 | 0.96 | 0.21 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| cld | 66 | 0.90 | 0.27 | 0.44 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| year | 21 | 0.97 | 84.76 | 1.60 | 81.51 | 83.52 | 84.91 | 86.07 | 87.48 | ▅▆▇▇▆ |
| dead | 0 | 1.00 | 0.21 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| id | 0 | 1.00 | 336.00 | 193.85 | 1.00 | 168.50 | 336.00 | 503.50 | 671.00 | ▇▇▇▇▇ |
| Name | Piped data |
| Number of rows | 531 |
| Number of columns | 20 |
| _______________________ | |
| Column type frequency: | |
| factor | 4 |
| numeric | 16 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| race | 0 | 1 | FALSE | 4 | bla: 303, whi: 211, nat: 13, ori: 4 |
| inout | 0 | 1 | FALSE | 2 | bor: 448, tra: 83 |
| delivery | 0 | 1 | FALSE | 2 | vag: 269, abd: 262 |
| sex | 0 | 1 | FALSE | 2 | mal: 267, fem: 264 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| birth | 0 | 1 | 84.63 | 1.54 | 81.51 | 83.43 | 84.77 | 85.83 | 87.48 | ▅▆▇▇▅ |
| exit | 0 | 1 | 84.76 | 1.55 | 81.05 | 83.56 | 84.87 | 85.99 | 87.72 | ▂▆▇▇▅ |
| hospstay | 0 | 1 | 47.04 | 63.50 | -295.00 | 21.00 | 40.00 | 64.00 | 797.00 | ▁▇▁▁▁ |
| lowph | 0 | 1 | 7.22 | 0.13 | 6.53 | 7.13 | 7.22 | 7.32 | 7.55 | ▁▁▃▇▂ |
| pltct | 0 | 1 | 204.49 | 80.83 | 16.00 | 148.00 | 204.00 | 256.00 | 571.00 | ▂▇▅▁▁ |
| bwt | 0 | 1 | 1135.61 | 240.04 | 400.00 | 960.00 | 1160.00 | 1330.00 | 1500.00 | ▁▃▆▇▇ |
| gest | 0 | 1 | 29.25 | 2.21 | 23.00 | 28.00 | 29.00 | 31.00 | 36.00 | ▁▇▇▆▁ |
| twn | 0 | 1 | 0.21 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| apg1 | 0 | 1 | 5.02 | 2.65 | 0.00 | 2.00 | 5.00 | 7.00 | 9.00 | ▅▆▅▇▇ |
| vent | 0 | 1 | 0.54 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| pneumo | 0 | 1 | 0.18 | 0.38 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| pda | 0 | 1 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| cld | 0 | 1 | 0.26 | 0.44 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| year | 0 | 1 | 84.63 | 1.54 | 81.51 | 83.43 | 84.77 | 85.83 | 87.48 | ▅▆▇▇▅ |
| dead | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| id | 0 | 1 | 328.66 | 183.38 | 2.00 | 175.00 | 329.00 | 479.50 | 671.00 | ▇▇▇▇▆ |
Сначала конвертируем переменные twn, apg1, vent, pneumo, pda, cld, dead, id в факторы.
| Name | Piped data |
| Number of rows | 531 |
| Number of columns | 20 |
| _______________________ | |
| Column type frequency: | |
| factor | 12 |
| numeric | 8 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| race | 0 | 1 | FALSE | 4 | bla: 303, whi: 211, nat: 13, ori: 4 |
| inout | 0 | 1 | FALSE | 2 | bor: 448, tra: 83 |
| twn | 0 | 1 | FALSE | 2 | 0: 422, 1: 109 |
| delivery | 0 | 1 | FALSE | 2 | vag: 269, abd: 262 |
| apg1 | 0 | 1 | FALSE | 10 | 8: 91, 1: 74, 6: 70, 7: 69 |
| vent | 0 | 1 | FALSE | 2 | 1: 288, 0: 243 |
| pneumo | 0 | 1 | FALSE | 2 | 0: 438, 1: 93 |
| pda | 0 | 1 | FALSE | 2 | 0: 425, 1: 106 |
| cld | 0 | 1 | FALSE | 2 | 0: 393, 1: 138 |
| sex | 0 | 1 | FALSE | 2 | mal: 267, fem: 264 |
| dead | 0 | 1 | FALSE | 2 | 0: 467, 1: 64 |
| id | 0 | 1 | FALSE | 531 | 2: 1, 4: 1, 5: 1, 7: 1 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| birth | 0 | 1 | 84.63 | 1.54 | 81.51 | 83.43 | 84.77 | 85.83 | 87.48 | ▅▆▇▇▅ |
| exit | 0 | 1 | 84.76 | 1.55 | 81.05 | 83.56 | 84.87 | 85.99 | 87.72 | ▂▆▇▇▅ |
| hospstay | 0 | 1 | 47.04 | 63.50 | -295.00 | 21.00 | 40.00 | 64.00 | 797.00 | ▁▇▁▁▁ |
| lowph | 0 | 1 | 7.22 | 0.13 | 6.53 | 7.13 | 7.22 | 7.32 | 7.55 | ▁▁▃▇▂ |
| pltct | 0 | 1 | 204.49 | 80.83 | 16.00 | 148.00 | 204.00 | 256.00 | 571.00 | ▂▇▅▁▁ |
| bwt | 0 | 1 | 1135.61 | 240.04 | 400.00 | 960.00 | 1160.00 | 1330.00 | 1500.00 | ▁▃▆▇▇ |
| gest | 0 | 1 | 29.25 | 2.21 | 23.00 | 28.00 | 29.00 | 31.00 | 36.00 | ▁▇▇▆▁ |
| year | 0 | 1 | 84.63 | 1.54 | 81.51 | 83.43 | 84.77 | 85.83 | 87.48 | ▅▆▇▇▅ |
## [1] "Построим графики плотности распределения количественных переменных"
Явные выбросы у переменной hospstay: отрицательные значения и значения больше 300 . Удалим их.
Раскрасим графики плотности по переменной ‘inout’.
Выбран t_test с методом Уэлча, т.к. минимальное количество в группе 80 и дисперсии неизвестны .
## # A tibble: 1 × 8
## .y. group1 group2 n1 n2 statistic df p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 lowph born at Duke transported 438 80 5.32 107. 0.000000588
##
## Welch Two Sample t-test
##
## data: cleaned_data_1$lowph by cleaned_data_1$inout
## t = 5.3159, df = 106.63, p-value = 5.876e-07
## alternative hypothesis: true difference in means between group born at Duke and group transported is not equal to 0
## 95 percent confidence interval:
## 0.05424374 0.11876292
## sample estimates:
## mean in group born at Duke mean in group transported
## 7.230751 7.144248
Интерпретация: т.к. среднее в группе transported ниже, то можно предположить, что в данной группе более низкая выживаемость.
## $hopkins_stat
## [1] 0.7904047
Оценим кластеризацию:
## [1] 0.62383
## $hopkins_stat
## [1] 0.8273205
Оценим кластеризацию:
## [1] 0.728013
Интерпретация: Длительное пребывание в госпитале ассоциировано с низким весом при рождении и ранними родами.
Интерпретация: В группе со смертельным исходом более низкий балл по шкале апгар, шкала апгар не ассоциирована с количеством рожденных детей.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 1.536 0.9965 0.8910 0.7655 0.51689
## Proportion of Variance 0.472 0.1986 0.1588 0.1172 0.05343
## Cumulative Proportion 0.472 0.6706 0.8294 0.9466 1.00000
Интерпретация: 3 компоненты обуславливают 83% дисперсии данных. В первую компоненту основной вклад вносят переменные: вес и гестационный возраст при рождении; во вторую компоненту основной вклад вносят число тромбоцитов; в третью компоненту основной вклад вносит минимальное значение pH и срок госпитализации.